San Francisco Housing Rental Analysis

In this assignment, you will perform basic analysis for the San Francisco Housing Market to allow potential real estate investors to choose rental investment properties.

In [1]:
# initial imports
import os
import pandas as pd
import matplotlib.pyplot as plt
import hvplot.pandas
import plotly.express as px
from pathlib import Path
from dotenv import load_dotenv
import panel as pn

%matplotlib inline
Bad key "text.kerning_factor" on line 4 in
C:\Users\andre\anaconda3\envs\pyvizenv\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution
In [2]:
pn.extension('plotly')
WARNING:param.panel_extension: A HoloViz extension was loaded previously. This means the extension is already initialized and the following Panel extensions could not be properly loaded: ['plotly']. If you are loading custom extensions with pn.extension(...) ensure that this is called before any other HoloViz extension such as hvPlot or HoloViews.
In [ ]:
 
In [3]:
# Read the Mapbox API key
load_dotenv()
mapbox_token = os.getenv("MAPBOX")

Load Data

In [4]:
# Read the census data into a Pandas DataFrame
file_path = Path("Data/sfo_neighborhoods_census_data.csv")
sfo_data = pd.read_csv(file_path, index_col="year")
sfo_data.head()
Out[4]:
neighborhood sale_price_sqr_foot housing_units gross_rent
year
2010 Alamo Square 291.182945 372560 1239
2010 Anza Vista 267.932583 372560 1239
2010 Bayview 170.098665 372560 1239
2010 Buena Vista Park 347.394919 372560 1239
2010 Central Richmond 319.027623 372560 1239

Housing Units Per Year

In this section, you will calculate the number of housing units per year and visualize the results as a bar chart using the Pandas plot function.

Hint: Use the Pandas groupby function

Optional challenge: Use the min, max, and std to scale the y limits of the chart.

In [5]:
# Calculate the mean number of housing units per year (hint: use groupby) 
# YOUR CODE HERE!
In [6]:
sfo_data.groupby(level=0).housing_units.mean()
Out[6]:
year
2010    372560
2011    374507
2012    376454
2013    378401
2014    380348
2015    382295
2016    384242
Name: housing_units, dtype: int64
In [7]:
# Use the Pandas plot function to plot the average housing units per year.
# Note: You will need to manually adjust the y limit of the chart using the min and max values from above.
# YOUR CODE HERE!

# Optional Challenge: Use the min, max, and std to scale the y limits of the chart
# YOUR CODE HERE!


#
In [8]:
fig_housing_units =sfo_data.groupby(level=0).housing_units.mean()
ax =fig_housing_units.plot(kind= 'bar', ylim= [370000,388500], title=' Housing Units in San Fransisco from 2010 to 2016')
ax.set_ylabel('Housing Units')
ax.set_xlabel('Year')
Out[8]:
Text(0.5, 0, 'Year')

Average Prices per Square Foot

In this section, you will calculate the average gross rent and average sales price for each year. Plot the results as a line chart.

Average Gross Rent in San Francisco Per Year

In [9]:
# Calculate the average gross rent and average sale price per square foot
# YOUR CODE HERE!
In [10]:
drop_column_sfo_data=sfo_data.drop('housing_units',1)
drop_column_sfo_data.groupby(level=0).mean()
Out[10]:
sale_price_sqr_foot gross_rent
year
2010 369.344353 1239
2011 341.903429 1530
2012 399.389968 2324
2013 483.600304 2971
2014 556.277273 3528
2015 632.540352 3739
2016 697.643709 4390
In [11]:
# Plot the Average Gross Rent per Year as a Line Chart 
# YOUR CODE HERE!
In [12]:
average_gross_rent=drop_column_sfo_data.groupby(level=0).gross_rent.mean()
ax=average_gross_rent.plot(kind='line',title='Average Gross Rent in San Francisco')
ax.set_xlabel('Year')
ax.set_ylabel('Gross Rent')
Out[12]:
Text(0, 0.5, 'Gross Rent')

Average Sales Price per Year

In [13]:
# Plot the Average Sales Price per Year as a line chart
# YOUR CODE HERE!
In [14]:
average_sales_price= drop_column_sfo_data.groupby(level=0).sale_price_sqr_foot.mean()
ax = average_sales_price.plot(kind='line', title= 'Average Sales Price')
ax.set_xlabel('Year')
ax.set_ylabel('Avg. Sale Price')
Out[14]:
Text(0, 0.5, 'Avg. Sale Price')

Average Prices by Neighborhood

In this section, you will use hvplot to create an interactive visulization of the Average Prices with a dropdown selector for the neighborhood.

Hint: It will be easier to create a new DataFrame from grouping the data and calculating the mean prices for each year and neighborhood

In [15]:
# Group by year and neighborhood and then create a new dataframe of the mean values
# YOUR CODE HERE!
In [16]:
new_sfo_data=sfo_data.reset_index()
new_sfo_data.groupby(['year', 'neighborhood'], as_index=False).mean().head(10).inplace=True
new_sfo_data.head(10)
Out[16]:
year neighborhood sale_price_sqr_foot housing_units gross_rent
0 2010 Alamo Square 291.182945 372560 1239
1 2010 Anza Vista 267.932583 372560 1239
2 2010 Bayview 170.098665 372560 1239
3 2010 Buena Vista Park 347.394919 372560 1239
4 2010 Central Richmond 319.027623 372560 1239
5 2010 Central Sunset 418.172493 372560 1239
6 2010 Corona Heights 369.359338 372560 1239
7 2010 Cow Hollow 569.379968 372560 1239
8 2010 Croker Amazon 165.645730 372560 1239
9 2010 Diamond Heights 456.930822 372560 1239
In [17]:
# Use hvplot to create an interactive line chart of the average price per sq ft.
# The plot should have a dropdown selector for the neighborhood
# YOUR CODE HERE!
In [18]:
new_sfo_data.hvplot.line(x= 'year', y="sale_price_sqr_foot", groupby= 'neighborhood')
Out[18]:

The Top 10 Most Expensive Neighborhoods

In this section, you will need to calculate the mean sale price for each neighborhood and then sort the values to obtain the top 10 most expensive neighborhoods on average. Plot the results as a bar chart.

In [19]:
# Getting the data from the top 10 expensive neighborhoods
# YOUR CODE HERE!
In [20]:
drop_year_new_sfo_data=new_sfo_data.drop('year',1)
most_expensive_neighborhood= drop_year_new_sfo_data.groupby(['neighborhood'], as_index=False).mean()
most_expensive_neighborhood.nlargest(10, columns='sale_price_sqr_foot')
Out[20]:
neighborhood sale_price_sqr_foot housing_units gross_rent
65 Union Square District 903.993258 377427.50 2555.166667
36 Merced Heights 788.844818 380348.00 3414.000000
38 Miraloma Park 779.810842 375967.25 2155.250000
51 Pacific Heights 689.555817 378401.00 2817.285714
71 Westwood Park 687.087575 382295.00 3959.000000
63 Telegraph Hill 676.506578 378401.00 2817.285714
57 Presidio Heights 675.350212 378401.00 2817.285714
10 Cow Hollow 665.964042 378401.00 2817.285714
56 Potrero Hill 662.013613 378401.00 2817.285714
60 South Beach 650.124479 375805.00 2099.000000
In [21]:
# Plotting the data from the top 10 expensive neighborhoods
# YOUR CODE HERE!
In [22]:
plot_m_e_n= most_expensive_neighborhood.nlargest(10, columns='sale_price_sqr_foot')
plot_m_e_n.hvplot.bar(x='neighborhood', y='sale_price_sqr_foot', rot=90 )
Out[22]:

Parallel Coordinates and Parallel Categories Analysis

In this section, you will use plotly express to create parallel coordinates and parallel categories visualizations so that investors can interactively filter and explore various factors related to the sales price of the neighborhoods.

Using the DataFrame of Average values per neighborhood (calculated above), create the following visualizations:

  1. Create a Parallel Coordinates Plot
  2. Create a Parallel Categories Plot
In [23]:
# Parallel Coordinates Plot
# YOUR CODE HERE!
In [ ]:
 
In [24]:
px.parallel_coordinates(plot_m_e_n, color='sale_price_sqr_foot')
In [25]:
# Parallel Categories Plot
# YOUR CODE HERE!
In [26]:
px.parallel_categories(plot_m_e_n, dimensions=['neighborhood', 'sale_price_sqr_foot', 'housing_units', 'gross_rent'], color='sale_price_sqr_foot', color_continuous_scale=px.colors.sequential.Inferno)

Neighborhood Map

In this section, you will read in neighboor location data and build an interactive map with the average prices per neighborhood. Use a scatter_mapbox from plotly express to create the visualization. Remember, you will need your mapbox api key for this.

Load Location Data

In [27]:
# Load neighborhoods coordinates data
file_path = Path("Data/neighborhoods_coordinates.csv")
df_neighborhood_locations = pd.read_csv(file_path)
df_neighborhood_locations.head()
Out[27]:
Neighborhood Lat Lon
0 Alamo Square 37.791012 -122.402100
1 Anza Vista 37.779598 -122.443451
2 Bayview 37.734670 -122.401060
3 Bayview Heights 37.728740 -122.410980
4 Bernal Heights 37.728630 -122.443050

Data Preparation

You will need to join the location data with the mean prices per neighborhood

  1. Calculate the mean values for each neighborhood
  2. Join the average values with the neighborhood locations
In [28]:
# Calculate the mean values for each neighborhood
# YOUR CODE HERE!
In [29]:
mean_value_neighborhood=drop_year_new_sfo_data.groupby(['neighborhood'], as_index=False).mean()
mean_value_neighborhood.head(10)
Out[29]:
neighborhood sale_price_sqr_foot housing_units gross_rent
0 Alamo Square 366.020712 378401.0 2817.285714
1 Anza Vista 373.382198 379050.0 3031.833333
2 Bayview 204.588623 376454.0 2318.400000
3 Bayview Heights 590.792839 382295.0 3739.000000
4 Bernal Heights 576.746488 379374.5 3080.333333
5 Buena Vista Park 452.680591 378076.5 2698.833333
6 Central Richmond 394.422399 378401.0 2817.285714
7 Central Sunset 423.687928 378401.0 2817.285714
8 Clarendon Heights 487.244886 376454.0 2250.500000
9 Corona Heights 587.539067 377232.8 2472.000000
In [30]:
# Join the average values with the neighborhood locations
# YOUR CODE HERE!
In [31]:
average_value_per_neighborhood=pd.concat([df_neighborhood_locations, mean_value_neighborhood], axis=1).head(10)
average_value_per_neighborhood=average_value_per_neighborhood.drop('neighborhood', 1)
average_value_per_neighborhood.head(10)
Out[31]:
Neighborhood Lat Lon sale_price_sqr_foot housing_units gross_rent
0 Alamo Square 37.791012 -122.402100 366.020712 378401.0 2817.285714
1 Anza Vista 37.779598 -122.443451 373.382198 379050.0 3031.833333
2 Bayview 37.734670 -122.401060 204.588623 376454.0 2318.400000
3 Bayview Heights 37.728740 -122.410980 590.792839 382295.0 3739.000000
4 Bernal Heights 37.728630 -122.443050 576.746488 379374.5 3080.333333
5 Buena Vista Park 37.768160 -122.439330 452.680591 378076.5 2698.833333
6 Central Richmond 37.777890 -122.445170 394.422399 378401.0 2817.285714
7 Central Sunset 37.749610 -122.489990 423.687928 378401.0 2817.285714
8 Clarendon Heights 37.753310 -122.447030 487.244886 376454.0 2250.500000
9 Corona Heights 37.785530 -122.456000 587.539067 377232.8 2472.000000

Mapbox Visualization

Plot the aveage values per neighborhood with a plotly express scatter_mapbox visualization.

In [32]:
# Create a scatter mapbox to analyze neighborhood info
# YOUR CODE HERE!
In [33]:
px.scatter_mapbox(average_value_per_neighborhood, lat='Lat', lon='Lon', size='sale_price_sqr_foot', color='gross_rent')
In [34]:
sfo_data.head()
Out[34]:
neighborhood sale_price_sqr_foot housing_units gross_rent
year
2010 Alamo Square 291.182945 372560 1239
2010 Anza Vista 267.932583 372560 1239
2010 Bayview 170.098665 372560 1239
2010 Buena Vista Park 347.394919 372560 1239
2010 Central Richmond 319.027623 372560 1239
In [35]:
# Define Panel Visualization Functions
def housing_units_per_year():
    fig_housing_units =sfo_data.groupby(level=0).housing_units.mean()
    plot_housing_units = plt.figure()
    ax =fig_housing_units.plot(kind= 'bar', ylim= [370000,388500], title=' Housing Units in San Fransisco from 2010 to 2016')
    ax.set_ylabel('Housing Units')
    ax.set_xlabel('Year') 
    plt.close(plot_housing_units)
    return pn.pane.Matplotlib(plot_housing_units, tight=True)

    
    # YOUR CODE HERE!


def average_gross_rent():
    average_gross_rent=drop_column_sfo_data.groupby(level=0).gross_rent.mean()
    fig_average_gross_rent = plt.figure()
    ax=average_gross_rent.plot(kind='line',title='Average Gross Rent in San Francisco')
    ax.set_xlabel('Year')
    ax.set_ylabel('Gross Rent')
    plt.close(fig_average_gross_rent)
    return pn.pane.Matplotlib(fig_average_gross_rent, tight=True)
   
    # YOUR CODE HERE!


def average_sales_price():
    average_sales_price= drop_column_sfo_data.groupby(level=0).sale_price_sqr_foot.mean()
    fig_average_sales_price = plt.figure()
    ax = average_sales_price.plot(kind='line', title= 'Average Sales Price')
    ax.set_xlabel('Year')
    ax.set_ylabel('Avg. Sale Price')
    plt.close(fig_average_sales_price)
    return pn.pane.Matplotlib(fig_average_sales_price, tight=True)
   
    # YOUR CODE HERE!


def average_price_by_neighborhood():
    
    return new_sfo_data.hvplot.line(x= 'year', y="sale_price_sqr_foot", groupby= 'neighborhood')
    
    # YOUR CODE HERE!


def top_most_expensive_neighborhoods():
    plot_m_e_n= most_expensive_neighborhood.nlargest(10, columns='sale_price_sqr_foot')
    return plot_m_e_n.hvplot.bar(x='neighborhood', y='sale_price_sqr_foot', rot=90 )
 
    # YOUR CODE HERE!


def parallel_coordinates():
    return px.parallel_coordinates(plot_m_e_n, color='sale_price_sqr_foot')
  
    
    # YOUR CODE HERE!


def parallel_categories():
    return px.parallel_categories(plot_m_e_n, dimensions=['neighborhood', 'sale_price_sqr_foot', 'housing_units', 'gross_rent'], color='sale_price_sqr_foot', color_continuous_scale=px.colors.sequential.Inferno)
   
    
    # YOUR CODE HERE!


def neighborhood_map():
    return px.scatter_mapbox(average_value_per_neighborhood, lat='Lat', lon='Lon', size='sale_price_sqr_foot', color='gross_rent')
    
    
    # YOUR CODE HERE!
In [36]:
pn.extension()
In [40]:
row1 = pn.Row(average_price_by_neighborhood, average_gross_rent, average_sales_price)
row2 = pn.Row( parallel_coordinates)
row3 = pn.Row(top_most_expensive_neighborhoods)
row4 = pn.Row(parallel_categories)

summary = pn.pane.Markdown("""
# Summary Finding
### My Map

""")
In [42]:
title = pn.pane.Markdown("# Real Estate Analysis")
tabs = pn.Tabs (
    ("Averages", pn.Column(row1)),
    ("Top most Expensive and Parallel Graphs", pn.Column(row3,row2, row4)),
    ("Neighborhood Map", pn.Column(summary, neighborhood_map))
)

pn.Column(title, tabs)
Out[42]:
In [ ]:
 
In [ ]: